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ABSTRACT 

Multicore CPUs and large memories are increasingly becom¬ 
ing the norm in modern computer systems. However, cur¬ 
rent database management systems (DBMSs) are generally 
ineffective in exploiting the parallelism of such systems. In 
particular, contention can lead to a dramatic fall in perfor¬ 
mance. In this paper, we propose a new concurrency control 
protocol called DGCC (Dependency Graph based Concur¬ 
rency Control) that separates concurrency control from exe¬ 
cution. DGCC builds dependency graphs for batched trans¬ 
actions before executing them. Using these graphs, con¬ 
tentions within the same batch of transactions are resolved 
before execution. As a result, the execution of the trans¬ 
actions does not need to deal with contention while main¬ 
taining full equivalence to that of serialized execution. This 
better exploits multicore hardware and achieves higher level 
of parallelism. To facilitate DGCC, we have also proposed 
a system architecture that does not have certain central¬ 
ized control components yielding better scalability, as well 
as supports a more efficient recovery mechanism. Our ex¬ 
tensive experimental study shows that DGCC achieves up 
to four times higher throughput compared to that of state- 
of-the-art concurrency control protocols for high contention 
workloads. 

1. INTRODUCTION 

Advancement in multicore processors in the last decade 
have enabled programs to significantly improve performance 
by exploiting parallelism. Further, the availability of larger 
and cheaper main memory makes it possible for a significant 
amount of data to reside in main memory. It is now feasible 
to have a single multicore system with large memory to han¬ 
dle applications that were previously supported by multiple 
machines. However, current database management systems 
(DBMSs) are not designed to fully exploit these new hard¬ 


ware features. In this paper, we will examine the design 
of multicore in-memory OLTP systems with the goal of im¬ 
proving the throughput of transaction processing by better 
exploiting modern multicore hardware. In summary, we di¬ 
vide transactions arriving at the DBMS into batches. Every 
transaction within each batch is chopped up into transac¬ 
tion pieces which are reorganized into an efficient concurrent 
execution plan that has no contention. We present a new 
control concurrency protocol based on the dependencies of 
transactions that ensures the correctness of the execution. 

We call our new concurrency control protocol Dependency 
Graph based Concurrency Control (DGCC). DGCC differs 
from traditional lock based or timestamp based protocols 
in that it separates the logic for concurrency control from 
the execution of the transactions. In traditional OLTP sys¬ 
tems, each transaction is handled by a worker thread from 
its beginning to its end. The worker thread is responsible 
for contention resolution and execution. Since each thread 
consumes systems resources, there is a limit to the number 
of threads and hence the number of concurrent transactions 
that can be present at any one time. Furthermore, overall 
performance is affected by contention as well as the inability 
to fully exploiting parallelism. To alleviate the problem and 
improve scalability, DGCC first chops up a batch of transac¬ 
tions into transaction pieces, and then builds a dependency 
graph that incorporates the dependency relationship of the 
transaction operations. DGCC then executes these depen¬ 
dency graphs in a manner that guarantees the execution of 
the operations is serializable. Furthermore, the execution 
will have no contention at runtime. 

We illustrate the basic idea of DGCC and compare it with 
the two traditional concurrency control protocols in Fig¬ 
ure]^ For a lock based protocol, as shown in Figure [^a), a 
deadlock occurs when transaction Txnl is holding A’s lock 
and requesting H’s lock, while transaction Txn2 is holding 
B’s lock and requesting A’s lock. To break the deadlock, ei¬ 
ther transaction Txnl or transaction Txn2 must be aborted. 
In a timestamp based protocol, shown in Figure[^b), trans¬ 
action Toni’s operations overlap with transaction Txn2’s 
operations. At the validation phase of transaction Txnl, 
it is found that record A has been modified by transaction 
Txn2, which completed after transaction Txnl started and 
had committed earlier. This causes transaction Ta;nl to 
be aborted. In addition, in both lock based and times¬ 
tamp based protocols, operations in one transaction must 
run sequentially within a single thread. As such, the two 
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Txnl = {Update(C),Update(A),Update(B)) Txn2 = {Update(D),Update(B),Update(A)} 



(a) Lock based 
protocol 


(b) Timestamp based 
protocol 


(c) DGCC 


Figure 1: An Example with Two Transactions 


transactions in Figure[^(a) and (b) can be concurrently exe¬ 
cuted by at most two threads. In DGCC, during dependency 
graph construction phase, transactions are broken down into 
transaction pieces, which allows the system to parallelize the 
execution at level of operations. More specifically, DGCC 
enables concurrent execution of the transaction operations 
as long as the they do not conflict. As shown in Figure[^(c), 
four threads are initiated for transaction Txnl and trans¬ 
action Txn2’s execution as they can simultaneously operate 
on four different records. If there are operations with depen¬ 
dency (e.g., read and write records A and B from the two 
transactions), DGCC will execute them in order. Finally, 
both transactions will successfully commit. In this manner, 
DGCC reduces the abort rate while at the same time en¬ 
abling higher concurrency and guaranteeing serializability. 

DGCC consists of a graph construction phase and an exe¬ 
cution phase, using a different work partitioning strategy for 
each phase. In particular, one worker thread is responsible 
for the construction of each dependency graph. At graph 
construction phase, n worker threads will work in parallel 
to build n different dependency graphs at the same time. 
If more than one transaction attempt to access the same 
data, during the execution phase, the dependency graphs 
constructed by DGCC guarantee that they will be executed 
in a serialized manner. In general, however, this approach 
exposes parallelism when the opportunity presents itself. 

DGCC is based on batch processing in a multicore in¬ 
memory system. As with any batch processing, latency is a 
valid concern. However, we shall reason that this is feasible 
in practice. First, in real applications, requests at the client 
side are always sent to the server in batches so as to re¬ 
duce the network overhead. More importantly, in-memory 
systems always need to write transaction logs to disk for 
the purpose of reliability. In order to reduce disk I/O cost, 
group commit protocols are not uncommon. In other 
words, current systems already both receive and commit 
transaction in a batch manner. Secondly, in the context of 
in-memory multicore systems, data access is extremely fast 
compared to that in traditional disk-based systems, thereby 
reducing latency. Thirdly, the latency due to the batch pro¬ 
cessing can actually be minimized by the tuning of the batch 
size. In summary, if the execution strategy is well designed, 
latency can be controlled to within acceptable bounds. The 
experiments conducted in our performance study confirms 


that fast batch processing is achievable. 

We have implemented an in-memory OLTP system with 
DGCC concurrency control protocol that supports high con¬ 
currency, efficient recovery and good scalability. The system 
architecture is designed for the modern multicore environ¬ 
ment. Our experiments show that it achieves significantly 
higher throughput, and scales well compared to other con¬ 
currency control protocols. 

In summary, this paper makes the following contributions: 

• We propose DGCC, a new concurrency control pro¬ 
tocol that separates contention resolution from execu¬ 
tion using dependency graph and achieves higher par¬ 
allelism. 

• A new in-memory multicore OLTP system supporting 
DGCC is prototyped. Besides DGCC, it supports an 
efficient recovery mechanism and a customized mem¬ 
ory allocation scheme that helps to avoid system mem¬ 
ory malloc at the runtime. 

• An extensive performance study of DGCC against three 
state-of-the-art concurrency control protocols was con¬ 
ducted. The performance study using two benchmarks 
shows that DGCC achieves up to four times higher 
throughput than the other three concurrency control 
protocols. 

The remainder of the paper is organized as follows. In 
Section 2, we introduce classical concurrency control proto¬ 
cols. We present DGCC in Section 3, and the architecture 
of our prototype system in Section 4. A comprehensive eval¬ 
uation is presented in Section 5, and we review some related 
work in Section 6. Finally, the paper is concluded in Section 
7. 

2. EXISTING CONCURRENCY CONTROL 
PROTOCOLS 

A transaction in a DBMS consists of a sequence of read 
and write operations. The DBMS must guarantee that (a) 
only serializable and recoverable schedules are allowed, (b) 
no operations of committed transactions are lost, and (c) 
the effects of partial transactions are not retained. In short. 
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the DBMS is responsible to ensure the ACID (Atomicity, 
Consistency, Isolation and Durability) properties. 

In the multicore era, concurrency control protocols should 
enable multi-user programs to be interleaved and executed 
concurrently with the net effect being is identical to exe¬ 
cuting them in a serial order. Essentially, concurrency con¬ 
trol protocols ensure the atomicity and isolation properties. 
Many research efforts have been devoted to this area. We 
shall follow the canonical categorization in |31| and review 
them in two categories, namely lock and timestamp based 
protocols. 


2.1 Lock Based Protocols 

The essential idea of lock based protocols is making use 
of locks to control the access to data. A transaction must 
acquire a lock on an object before it can operate on the 
object to prevent unsafe interleaving of transactions. With 
this kind of protocols, transactions accessing data locked by 
other transactions may be blocked until the requested locks 
are released. There are at least two types of locks: write lock 
and read lock. Write lock is an exclusive lock and read lock 
can be a shared lock. The rules of lo ck b locking is usually 
presented by lock compatibility table [24| . 

System with lock based protocol may use a global lock 
manager to grant and release locks. To improve the scala¬ 
bility, de-centralized lock manager has been proposed that 
co-locate the lock table with t he r aw data. 

Two-phase locking(2PL) [5 10 is a widely used locking 
protocol. In the growing phase, a transaction first acquires 
locks without releasing any. During the shrinking phase, 
it can only release locks without acquiring any locks. In a 
multi-programmed environment, lock based protocols have 
to deal with deadlocks, and transactions may be aborted 
when a deadlock cannot be prevented. Overall system per¬ 
formance is affected by transaction blocking, deadlock de¬ 
tection and resolving. 


2.2 Timestamp Based Protocol 

Timestamp based protocols assigns a global times¬ 

tamp before processing. By ordering the timestamp, the ex¬ 
ecution order of transactions is determined. When multiple 
transactions attempt to access the same data, the transac¬ 
tion with smaller timestamp should be executed first. As 
shown in Figure if conflicts exist during execution, the 
transaction will be aborted and restarted. 

Optimistic Concurrency Control (OCQ and Multi- 
Version Concurrency Control (MVCC) are two widely 
used timestamp based protocols. OCC assumes low data 
contention where conflicts are rare. Transactions can com¬ 
plete without blocking. However, before a transaction com¬ 
mits, a validation is performed to check if there is any con¬ 
flict. If conflicts exist, the transaction will be aborted and 
restarted. MVCC maintains multiple versions of each data 
object and is more efficient for read operations. The read 
operations can access the data of an appropriate version 
without being blocked by other write operations. A peri¬ 
odic garbage collection is required to free inactive data. 

Timestamp based protocols perform poorly on workloads 
with high contention, due to their high abort rate. Aborts 
not only consume computing resources, but also additional 
work needs to be performed to undo the aborted transac¬ 
tions. Moreover, these kinds of protocols usually requires a 


Table 1: Notations 


r 

a set of transaction 

g 

dependency graph 

S 

a schedule of transaction execution 

Gis) 

conflict graph of s 

U 

a transaction with time stamp i 


the set of pieces of transaction ti 


the Pth piece of transaction ti 

readset{4>ti) 

the read record set of 

writeset{(l )^) 

the write record set of 

accessset{(j)f ) 

readset{<j)^.) U writeset{(f>^.) 

k 

one record stored in database 

m 

latest write transaction piece on k 

^(fc) 

the dominating set of k 

^ time-order 

timestamp ordering dependency 

^ logic 

logic dependency 


centralized manager to assign unique timestamp to transac¬ 
tions. This limits system scalability. 

3. DEPENDENCY GRAPH BASED CONCUR¬ 
RENCY CONTROL 

In this section, we present the Dependency Graph based 
Concurrency Control (DGCC) protocol. 

Typically, arriving transactions cannot be processed by 
the system immediately. They will first wait in a transac¬ 
tion queue. Unlike the worker thread in the lock and times¬ 
tamp based concurrency control protocols which processes 
the transactions one by one, DGCC grabs a batch of transac¬ 
tions from the transaction queue to process. The batch size 
depends on the number of transactions in the transaction 
queue and the pre-defined maximal batch size. There are 
two separate phases: Dependency Graph Construction 
and Dependency Graph Execution. Multi-threading is 
used in both phases for maximal parallelism. More impor¬ 
tantly, no locks are required in the whole process. Neither 
are there any aborts due to conflicts. Table summarizes 
the notations used in this section 

3.1 Chopping Transactions in DGCC 

Conventional concurrency control protocols process a sin¬ 
gle transaction sequentially with no concurrent processing 
within a transaction. DGCC chops a transaction into a set of 
smaller transaction pieces according to its type and internal 
logics. Transactions in OLTP applications are often repeti¬ 
tive and store-procedures are widely used in current systems. 

A transaction piece consists of a set of store-procedures that 
operates on some records in the database. Each piece is rep¬ 
resented as a vertex in our dependency graph. It is the unit 
in both the dependency graph construction and dependency 
graph execution. Transaction pieces may be partially or¬ 
dered. We dehne the partial-order between two transaction 
pieces as a logic dependency in the following subsection. 

3.2 Dependency Graph Construction 

During dependency graph construction, one batch of trans¬ 
actions is divided into several disjoint sets of transactions. 

A worker thread will construct a dependency graph Q from 
a set of transactions T = {t\,t 2 ,--- ,tn}. Each transac¬ 
tion ti is associated with a timestamp i. Transactions in a 
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Tl: T11 {r(A),r(B)),T12{w(A)}, T13{w(B)} ' 

.T2: T21 {r(A),r(D),w(A)K T22{r(C).r(D),w(C)} . 


Tl; Tl 1 {r(A),r(B)},T12{w(A)}, T13{w(B)) ' 

T2; T21 {r(A),r(D),w(A)}, T22{r(C),r(D),w(C)} 

T3: T31{r(D),w(D)},T32{r(A),w(A)},T33{r(E),w(E)} 


Dominating set of A: {T21} 
Dominating set of B: {T13} 
Dominating set of C: {T22} 
Dominating set of D: {T21.T22} 



I" Add T3 into 
I dependency graph 



('t31; 


@ (t33; 

'Dominating set of A: {T32} 
Dominating set of B: {T13} 
Dominating set of C: {T22} 

Dominating set of D: {T31} 
Dominating set of E: {T33} 


Logic dependency- ► Time-order dependency j 


Figure 2: Dependency Graph Construction 


given set are processed ordered by their timestamps. Each 
transaction, ti, is further divided into a set of transaction 
pieces, 4>tir ' ■ We define two types of 

dependency relations on the pieces: logic dependency re¬ 
lation i^iogic and timestamp ordering dependency relation 
i^time-order- We first define the logic dependency relation 

logic ■ 

Definition 1 (Logic Dependency). Transaction piece 
(fif. logically depends on , denoted as 4>l_ y logic <(’t ; */ titid 
only if i = j and (fl. is executed after . 

From the above definition, we can see that i^iogic repre¬ 
sents the logical execution order of the pieces within one 
transaction. Apart from the logic dependency relation, we 
also need to resolve the execution order of pieces from dif¬ 
ferent transactions, which is defined by timestamp ordering 
dependency relation Atime-order- For a transaction piece , 
writeset{(l)^_) and readset{(j>^_) are used to represent the set 
of records written to and read, respectively. The access set 
accessset{(j>^.) is readset{(j>^.) U writeset{<j)^.). 

Definition 2 (Timestamp Ordering Dependency). 

A timestamp ordering dependency (f>1. y time-order 4’^^ exists 
if and only if j > i and (writeset{(f>^.) n accessset{<j)^.) yf 0 
or accessset{(j)l.) n writeset{cj)^J ^ &)■ 

Definition 3 (Dependency Graph). Given a set of 
transactions T = {ti,t 2 , ■ ■ ■ ,tn}, and the associated sets of 
transaction pieces the dependency graph 

Q = (V, £) consists of 

• V — U $42 U • • • U , and 

• £ = such that (j)^. £ $ 4 ;, £ $ 4 j, and 

4^1 j y logic 4^11 ar y time-order 4^1^^ ■ 

It is not efficient to analyze 4il_ with every piece in Q 
when we add (fl.. into Q. Furthermore, explicitly recording 
all timestamp ordering dependency edges between all the 
transaction pieces will result in a lot of edges. So during de¬ 
pendency graph construction, we maintain the dominating 
set for each record k that is accessed in Q. Here we 

define the latest write transaction piece on fe as: 

Definition 4 (Latest Write Transaction Piece). 
C{k) = (f^.such that $4>tj ^ F, {j > i) and k £ writeset{(f>‘4.) 


Then the dominating set 4'(fc) is defined as follows: 

Definition 5 (Dominating Set). 'i!{k) = {4}t \4>t = 
C{k) and ^ F, {j > i)Ak £ accessset{(l)^^))} U {(jAtJfc G 

readset{4)‘l.) and {$4>tj ^ V,{j > i) A k £ writeset{(j>l^))} 

The dominating set 'l'(fc) contains only C{k) when there 
are no subsequent pieces accessing k or it will contain all 
the operations that read k after jC{k). Hence by maintaining 
the dominating set $(fc) for each record fe, we only need to 
analyse (fl. with the transaction pieces in $ (fe) to add edges 
when we insert cjil. into Q. 

Now, we can summarize the dependency graph construc¬ 
tion algorithm for a set of transactions T as Algorithm 
We use the example in Figure to illustrate the depen¬ 
dency graph construction process. There are three trans¬ 
actions Tl, T2, T3. Our example begins after Tl and T2 
have already been inserted into the dependency graph. The 
red directed edges represent logical dependency and green 
directed edges represent timestamp ordering dependency. 

When T3 is inserted into the dependency graph, it is di¬ 
vided into three pieces r31, T32 and T33. For T31, we 
check the dominating set of record D add green directed 
edges from r21 to T31 and from T22 to T31. For r32, we 
check the dominating set of record A and add green directed 
edges from T21 to T?>2. For r33, there is no dominating set 
of record E and hence we just insert r33 into Q with no 
edges connected to it. Apart from adding edges into Q, we 
update the dominating set according to the accessset of each 
piece. 

3.3 Dependency Graph Execution 

DGCC executes dependency graphs sequentially in a greedy 
manner. For a dependency graph Q, we iteratively select ver¬ 
tices with zero in-degree to execute and remove these ver¬ 
tices as well as their out-going edges from the graph. This 
process will repeat until there are no vertices left in Q. We 
outline the dependency graph execution in Algorithm As 
Figureshows, at the first round, we choose ni,T22, and 
r33 to execute and remove their out-going edges. We then 
iteratively select {n2, n3},{r21},{r32, T31} to execute. 

3.4 Correctness 

We shall now prove that DGCG guarantees strict serial- 
izability. 
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Algorithm 1: construct the dependency graph Q for one 
transaction set T 

for tj in T do 

split tj as 

for (hi in 4>t do 

‘'3 3 

for kh in accessset ^ do 
if '^{kh) = 0 then 

add (j>l^ into Q and insert (pl. into ^{kh)\ 
break; 

end 

if 'i’{kh) contains only one piece (f>^. that 
write on kh then 

add edge from point to (j)l_ 

representing (f>l_ ^time-order ; 
clear ^{kh) and insert cf)!. into ’^(kh)', 

end 

else 

if read on kh then 

add edge from C(kh) point to (j>l^ 
representing (j)l. ^time-order J^{kh); 
insert (j>l. into ’^{kh)\ 

end 

else 

for 04 . in '^{kh) do 

add edge from 0 ^^ point to (j)l. 
representing 0 f^ ^time-order 0 ?,; 

end 

clear ’i!{kh) and insert 0 ^^ into ’i!{kh)', 

end 

end 

end 

end 

add edges based on i^iogic dependency; 

end 


3 . 4.1 Conflict Serializability 

In the previous section, the dependency graph Q works 
as a schedule s of T. We can prove that the schedule, s, 
is co nflic t-serializable based on Conflict Serializability The- 
orem[^. In other words, we need to show that its conflict 
graph G{s) is acyclic. 

Definition 6 (Conflict Graph). Let s be a sched¬ 
ule. The Conflict Graph, G(s) = (V,E) of s, is defined 
by 

V = T 

{ti, tj) G E (i 7 ^ j) and 30?., (fl. G V, 0f^ ^ 0?. 

As we only have two dependency relations i^time-order And 
logic, the conflict relation in the conflict graph G{s) should 
be either l^time-order er l^logic- 

Firstly, let’s consider )^time-order in G{s). Based on its def¬ 
inition, if there is a directed edge from 0 ^. to (f)l_. then the 
timestamp of the second piece, j, must be greater than that 
of the first piece, i.e., i. Now if G(s) is cyclic, then we can 
always find a cycle with edges that ( 04 ° , 041 ^ ), ( 0 ^".^ , 01 )° ), 




Figure 3: Dependency Graph Execution 


Algorithm 2: execute one dependency graph Q 
while true do 

select vertices with zero in-degree as 

{vi,V2,-- ■ ,u„}; 

for Vi in {vi,V2, ■ • ■ , u„} do 

add Vi’s corresponding piece cj)^. into thread 
pool; 
end 

j wait for thread pool have no more pieces to execute; 

end 


• ■ •, ^ ) "where io < ii < • • • < iv-i < 

iv and < io- Obviously, this violates the initial condi¬ 
tion, namely i < j. In other words, if we only consider the 
I^time-order dependency, G{s) must be acyclic. 

Next, we consider )^iogic dependency. Based on its defini¬ 
tion, ^logic will not lead to an edge in G{s) because ^logic 
only exists between two pieces within the same transaction. 
So G{s) is still acyclic. 

Thus having considered the only two possible forms of 
dependencies, we can conclude that G{s) must be acyclic 
and s is a conflict-serializable schedule. 

3 . 4.2 Strictness 

In a dependency graph construction, we have resolved all 
the conflicts between transactions. Therefore in executing a 
dependency graph, there would not be any transaction abort 
caused by conflicts. Transactions can only be aborted due 
to updates violating the database’s schema constraints. For 
these, we add condition-variable-check transaction pieces. 
As an optimization, if there is more than one condition- 
variable-check transaction piece, we will combine them to¬ 
gether. )^iogic dependency relations are inserted between the 
other pieces in the transaction with the condition-variable- 
check piece. If the condition-variable-check piece aborts, no 
other pieces in the same transaction that have ^logic de¬ 
pendency relations with it will execute. As a consequence, 
no cascading aborts are possible during the execution of a 
dependency graph. 

3.5 Differences With Transaction Chopping 

Transaction chopping is a method that divides trans¬ 
actions into pieces to execute with the aim of achieving bet- 
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ter parallelism. It guarantees the serializability of trans¬ 
action execution by performing static analysis on the rela¬ 
tions between transaction pieces. This is known as SC-graph 
analysis. However a simple static chopping of transactions 
usually leads to multiple SC-cycles that have to be merged. 
Hence transaction pieces are still relatively large. DGCC 
analyzes the relationships between transaction pieces during 
runtime, yielding smaller transaction pieces. This finer gran¬ 
ularity in DGCC, in general, yields more parallelism than 
transaction chopping. Furthermore, during the execution 
of the transaction pieces, transaction chopping still requires 
traditional concurrency control to resolve conflicts. This 
leads to possible abort and restart of transaction pieces. In 
DGCC’s dependency graph execution, no transaction pieces 
will abort due to conflicts. 

Two transactions are shown in Figure where transac¬ 
tion Txnl reads record A and record B w hile t ransaction 
Txn2 writes record A and record B. Figure [4(a) shows how 
transaction chopping works with a SG-graph. SC-cycles in 
SC-graph should be merged. Finally, there is only one piece 
for transaction Txnl and one for transaction Txn2. On the 
contrary, as illustrated in Figure 4(b)[ DGCC can chop both 
Txnl and Txn2 into two pieces, which means fine-grained 
chopping is acceptable in DGCC. 


l_Txnl = {Read(A),Read(B)} Txn2={Write(A),Write(B)} i 


Txnl Txn2 

[ Read(A) ] - [ Write(A) ] 

[ Read(B) ] - [ Wr!te(B) ] 


Txnl 

Txn2 

Read(A) 

Read(B) 


Write(A) 

Write(B) 


- C-edge -S-edge 

(a) Transaction Chopping by SC-graph 


Txnl Txn2 

[ Read(A)) [write(A)) 

[ Read(B) ) (Write(B) ] 

(b) Transaction Chopping in DGCC 


Figure 4: Transaction Chopping 


4. SYSTEM ARCHITECTURE 

This section presents the architecture of the transaction 
processing system we have designed to support DGCC. The 
system architecture consists of three major components (shown 
in Figure]^, namely Execution Engine, Storage Manager, 
and Statistic Manager. 

4.1 Execution Engine 

4.1.1 Initiator 

The execution engine is mainly responsible for managing 
transaction requests. It maintains a set of request queues, 
and each queue is handled by a dependency graph construc¬ 
tor. In some applications, transaction requests may have 
different priorities. The initiator will adjust the priority of 


OLTP Application 


Oetsbase Sdiema 

Griph 

Stored 

Procedures 

Woriibsl 
infer motion 


Statistics 

Manager 


Transaction Initiator H 

r - 

Dependency Graph Execution Engine 


Dependency I Dependency Dependency 

Graph ■ Graph WHIM Graph 
Constructor I Constructor Constructor 


Dependency Graph Executor 


Graph Queue I Worker Thread Pool 


Recovery Manager 


t t t t t i 


Figure 5: System Architecture 


each queue according to requirement, e.g., requests of higher 
priority will be inserted into the queue with higher priority. 
At the execution time, requests in the queue with a higher 
priority will be processed first. By default, a transaction’s 
priority is set according to its timestamp, i.e., a transaction 
with a smaller timestamp has a higher priority. 

4.1.2 Dependency Graph Constructor 

The constructor takes a batch of transactions from a queue 
and resolves their contentions by building a dependency 
graph. The batch size is the smaller of the number of trans¬ 
actions in the transaction queue and a pre-defined maxi¬ 
mum batch size. When system is saturated, the batch size 
is equal to the maximum batch size. However, we cannot 
assume that the system is always saturated. After hnishing 
one round of batch processing, the constructor will check the 
transaction queue. If the number of transactions waiting in 
the transaction queue is less than the pre-defined maximum 
batch size, all the available transactions will be processed in 
this batch. The batch size in our system can be adjusted dy¬ 
namically to suit workloads of different request rates. This 
strategy ensures that the system will not wait indefinitely for 
sufficient number of transactions to arrive before processing 
them. 

For each transaction in the batch, it first generates ver¬ 
tices according to the transaction’s type and its parameters. 
To avoid any contention, the dependency graph constructor 
uses one single thread to build each dependency graph. To 
better exploit parallelism in the CPU, several graphs can 
be constructed in parallel by different threads. Each graph 
construction is completely independent thereby eliminating 
any need for synchronization between the different threads. 
It is possible that there are still conflicts between the differ¬ 
ent dependency graphs. We resolve this kind of conflicts by 
processing the constructed dependency graphs sequentially 
at a time in the Graph Executor. 
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4.1.3 Graph Executor 

After graph construction, the graph executor will execute 
the graphs according to their priorities. From the depen¬ 
dency graph, the executor iteratively extracts an executable 
vertex set consisting of vertices with no incoming edges. The 
update of these vertices does not depend on any other ver¬ 
tices. It follows that any two vertices in the executable ver¬ 
tex set have no contention. It is therefore safe to allow 
multiple worker threads to execute the vertices in the ex¬ 
ecutable vertex set, and they can do so without requiring 
any coordinations. When all the vertices of one graph are 
processed, the transactions will commit and responses will 
be sent to their clients. 

In our prototype, we implemented a fixed number of threads 
that will compete to work on either the graph construction 
or execution. During dependency graph execution, if the 
executable vertex set at each iteration is relatively small, 
the overhead of context switching and competition among 
the worker threads compared to the small amount of work 
will make multithreading unprohtable. As an optimization, 
if the size of the executable vertex set is small, we assign 
all the work to one worker thread instead of allowing all the 
worker threads to compete. 


4.2 Recovery Manager 

By maintaining all data in main memory, in-memory sys¬ 
tems significantly reduce disk I/Os, and, consequently, achieves 
better throughput with lower latencies. However, for reli¬ 
ability, most in-memory systems flush transaction logs to 
disks and perform checkpointing periodically. 
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4.2.1 Transaction Logs 

Before a transaction commits, the system will first gener¬ 
ate its log records and flush them to the log files on disk. 
The recovery component has logger threads which are re¬ 
sponsible for flushing the logs to disks. Traditionally, there 
are two kinds of lo ggin g strategies, ARIES logging 
Command logging [21| . 

In our system, transactions of one graph commit at the 
same time. Instead of generating log records for a single 
transaction, our system constructs log records for all the 
transactions in a batch simultaneously. Writing all these log 
records at the same time fully utilizes the disk I/O band¬ 
width, thereby improving the system’s overall performance. 

Each vertex in the dependency graph has one log record 
consisting of the vertex’s function ID, parameters, and de¬ 
pendency information. This information is sufficient for 
the reconstruction of the dependency graph during recov¬ 
ery. Our logging scheme combines the advantages of both 
ARIES and command logging. No real data needs to be 
recorded in the log hies, hence reducing the size of the logs. 
During recovery, we only need to replay the log records to 
reconstruct the dependency graphs and then execute the re¬ 
constructed graph. 


rehected in the checkpointing. This means our checkpoint¬ 
ing is not a consistent snapshot of the database, and it needs 
to combine with the transaction logs. 

To recover from a failure, our system hrst reloads the lat¬ 
est checkpoint and replays the transaction log records from 
that time point. It then reprocesses the committed transac¬ 
tions. 


4.3 Storage Manager 

The system’s storage manager is designed to maintain the 
whole data in the database. It interacts with the execution 
engine to retrieve/insert/update/delete the data. Both the 
B'^-tree index and hash index are supported. 

DGCC guarantees the serializability and zero-conflict for 
write and read operations. However, insert and delete op¬ 
erations also requires the index to be correctly maintained. 
Algorithms 20 that have been proposed to exploit more 
concurrency in indexing are orthogonal to our proposed con¬ 
currency control protocol. We can make use of any one of 
them together with DGCC to enhance the system’s overall 
performance. 

Our system maintains all of its allocated memory space on 
its own to avoid frequent invocations of system calls (such 
as malloc). To eliminate the bottlenecks in the storage man¬ 
ager, the system divides up the pre-allocated memory space, 
and assigns a worker thread to each section to insert/delete 
its data. It usually has many insert/delete operations for 
OLTP applications. The memory usage efhciency should 
be taken into consideration. A garbage collection thread in 
the storage manager will be invoked periodically to collect 
inactive objects and compact the memory space. 


4.4 Statistics Manager 

As shown in Figure our system has a statistics man¬ 
ager that collects runtime statistics information (such as 
real-time throughput, latency etc.). It also interacts with 
the other components to adjust the system configuration 
dynamically. For example, since our system processes trans¬ 
actions in batches due to DGCG, the size of the dependency 
graph affects both the throughput and latency. A larger 
batch size is better for supporting higher throughput, and 
a smaller batch size provides a faster response time. The 
maximal batch size can be adjusted accordingly based on 
the statistics and the requirements. Furthermore, using the 
statistics information collecting from the storage manager, 
the system decides when to invoke the garbage collection 
thread. 


5. EXPERIMENTS 

In this section, we evaluate the effectiveness of DGGC, by 
comparing it with the following concurrency control proto¬ 
cols, which implemented in a multicore DMBMs |31| . 

• 2PL - Two-Phase Locking with deadlock detection 


4.2.2 Checkpointing 

In order to recover our database within a bounded time, 
our system takes periodic checkpointing. Our recovery com¬ 
ponent maintains several checkpointing threads. The entire 
memory is divided up into sections and each checkpointing 
thread is responsible for one such section. 

Even as the checkpointing threads are working, transac¬ 
tions continue to execute. However, those commits are not 


• OCC - Optimistic Goncurrency Control, 

• MVCC - Multi-Version Concurrency Control 

• DGCC - Dependency Graph based Concurrency Con¬ 
trol 

In our evaluation, general optimizations for 2PL, OCC 
and MVCC are enabled to make a fair comparison. They are 
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Figure 6: Effect of Write Operations,0=0.8 


Table 2: Parameter Ranges for Evaluations 


Parameter 

Description 

Range 

e 

YCSB Zipfian parameter 

0 . 0 , 0 . 5 . 0 . 6 . 0 . 7 , 0.8 

1 

YCSB read/write ratio 

4,1,0.25 

K 

worker thread number 

1 , 2 , 3 , 4 , 5 ,6, 7 ,8 

s 

maximal batch size 

100,300,500,800,1000, 

5000,10000,20000 


optimized with a customized memory allocation component 
to avoid malloc syscall. Moreover, instead of centralized 
lock tables, all of them support decentralized record-level 
lock tables. 

All the experimental evalnations are condncted on a server 
with Intel Xeon 2.2 GHz 24-core GPU with 48 hyperthread¬ 
ing and 64GB RAM. It contains 4 NUMA nodes. To elimi¬ 
nate the effects of NUMA architecture, we run most exper¬ 
iments in one NUMA node with 6 cores. Each core has a 
private 32KB LI cache, 256KB L2 cache and supports two 
hyper threads. The cores in the same NUMA node share a 
12MB L3 cache. 

We use two popular OLTP benchmarks, namely YCSB 
and TPG-G [^. YCSB is used to evaluate the performance 
of these concurrency control protocols under different con¬ 
tention rates caused by data access skewness. TPC-C is 
used to simulate a complete order-entry environment whose 
transaction scenario is much more complex than that of 
YCSB. The contention rate in TPC-C is controlled by the 
number of warehouses. 

The main purpose of concurrency control protocols is to 
resolve contentions in a multi-programmed environment. There 
are three factors that typically dominate the intensity of the 
contentions. The first one is the ratio of write operations in 
the workload. The second is the data access skewness, in 
particularly, frequently accessed data encounter contention 
more easily. Another factor is the number of concurrent 
worker threads. The higher the number of parallel transac¬ 
tions, the larger is the probability of contention. 

In the following experiments, we evaluate the performance 
of DGCC with respect to all these factors using the two 
benchmarks. The parameters we used in the experiments 
are listed in Table with default setting underlined. 

5.1 Read vs Write Intensive Workloads 

Since read-only transaction pieces will not generate any 
contentions, we used the YCSB benchmark that has both 
read and write pieces. 


Figure shows the performance of different concurrency 
control protocols on three workloads of different read/write 
ratios. All protocols perform better on workload of more 
read pieces. As the write ratio increases in the workload, 
the performance of 2PL, OCC and MVCC drops dramati¬ 
cally. Since more write pieces translate to higher probabili¬ 
ties of contention, 2PL, OCC and MVCC need to spend a lot 
of time resolving contentions. DGCC is significantly more 
resilient to this increase. There is little difference between 
reads or writes at the dependency graph construction phase. 
The performance reduction in DGCC is due to the fact that 
write pieces usually take more time than read pieces. 


5.2 Scalability 

In Figure we test the performance of the four con¬ 
currency control protocols under different contention rates. 
Contention rate is controlled by setting the parameter 9 in 
YCSB’s Zipfian distribution. The read/write ratio 7 in the 
experiments are fixed to 1. 

In summary, DGCC shows the best performance under 
different contention rates. The benefits come mainly from 
the separation of contention resolution and execution. By re¬ 
solving contentions in advance, no worker thread is blocked 
during the execution and the acyclicity of the dependency 
graph avoids the aborts caused by contention. 

It is notable that in Figure 7(a)[ 2PL has a comparable 
performance with DGCC. In this experiment, 0=0.5 in Zip¬ 
fian distribution results in the lowest contention rate. Under 
this scenario, 2PL has little overhead because it does not 
waste time on acquiring locks. Further, deadlocks rarely oc¬ 
cur because the probability that more than one transaction 
competing for the same data is low. 

The reason for the drop in DGCC’s performance when the 
thread count is increased (7 and 8 in our experiment) are two 
fold. Firstly, our experiments ran on 6 cores. When more 
than 6 threads are running concurrently, the overhead of 
context switch becomes significant. Secondly, the increase 
in thread count will inevitably result in more contention, 
resulting in higher overhead to resolve contention for all four 
protocols. 

OCC and MVCC are timestamp based protocols. When 
the data access distribution is not very skewed, they scale 
well with the number of worker threads. However, compared 
with 2PL and DGCC, timestamp based protocols have to 
spend time in assigning the timestamp to each transaction. 
In order to guarantee the correct serial order, such systems 
usually use a centralized component to perform the assign- 
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7: Throughput for the YCSB workload, 7=1 




Figure 8 : Throughput for the TPC-C workload 


ment and this easily contributes to the performance bottle¬ 
neck. Moreover, at the commit time, OCC and MVCC must 
validate that the execution is serialized according to the as¬ 
signed timestamps. Whether or not a transaction aborts, 
transactions accessing the same data have to be validated 
one by one. 

The main cost of OCC and MVCC comes from processing 
abort when the contention is high. Unlike aborts in lock 
based protocols, aborts at the commit phase not only cost 
the usual processing time but also require extra effort in 
eliminating the effects of those aborted transactions in the 
database. 

Figure]^ shows the evaluation of the four protocols using 
the TPC-C benchmark. The contention rate in TPC-C is 
usually controlled by the number of warehouses. In this ex¬ 
periment, we set the number of warehouse to 1 so as to create 
enough contention. There are five types of transactions in 
TPC-C: New-Order, Payment, Delivery, Order-Status and 
Stock-Level. New-Order and Payment are the most frequent 
transactions, accounting for almost 90% of the whole bench¬ 
mark. Therefore, in addition to the entire benchmark, we 
also compare performance using only these two transactions 
separately. 

Figure 10(b) shows the results when only New-Order is 
considered. Each New-Order transaction, on average, com¬ 
prises of ten different items. These items’ information have 
to be read and the related stock information need to be up¬ 
dated. Which item gets accessed is entirely random, and 
this leads to a relatively low level of contention. Results 
shown in Figure [8(b)| are within the expectation. Although 
DGCC still achieves the best performance, 2PL comes in a 
close sec ond. 

Figure 8(c) shows the situation when only Payment trans¬ 
actions are involved. Each Payment transaction tries to 


record a payment from a customer, and it needs to update 
the warehouse. Those transaction pieces have to be done se¬ 
rially, thereby severely restricting the inherent parallelism. 
Further more, the longer serial execution logic needs more it¬ 
erations in the DGCC’s execution phase. This translates to 
a higher overhead in areas such as work dispatch and worker 
thread scheduling, and affects the scalability of DGCC as a 
result. 

Figure 8(a) shows the results on the complete TPC-C 
workload. Other transactions amortized the effects of pay¬ 
ment transaction, DGCC has a more balanced workload at 
each iteration, making it more scalable. However, the high 
contention caused by Payment transactions is still the bot¬ 
tleneck for the other protocols. 


5.3 Data Access Distribution 

In reality, OLTP applications tend to access certain data 
more frequently. For example, in an online shopping sce¬ 
nario, popular items are accessed more frequently than oth¬ 
ers. The distribution of data accesses has a significant im¬ 
pact on the level of contention. YCSB assumes that data 
accesses follow a Zipfian distribution whose parameter 0 con¬ 
trols the skewness. For a given number of working threads, 
a larger 0 translates to a higher contention. 

Figure [m shows the impact of 0 on the performance of 
the four protocols. When 0 is small, data accesses are more 
likely to be uniformly distributed. This is the ideal case for 
all the protocols. 

As 0 increases, the data access distribution becomes more 
skewed, resulting in higher contention and lower performance. 
Yet, compared to 2PL, OCC as well as MVCC, DGCC is sig¬ 
nificantly less sensitive to increased contention. Higher con¬ 
tention may increase the depth of the dependency graph, 
and as a result more iterations are required at the execution 
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Figure 9: Average Latency for the YCSB workload, 7=1 






(c) Payment 


Figure 10: Average Latency for the TPC-C workload 
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Figure 11: Effects of Access Distribution on YCSB, 

K = 8,7 = 1 


Figure and show the latency of four protocols un¬ 
der different workloads. The average latency of OCC and 
MVCC increases when there is more contention. They re¬ 
quire more time to perform validation at the commit phase. 
Furthermore, the latency of timestamp based protocols is 
also affected by the centralized timestamp assignment. When 
there is more contention, 2PL spent much time waiting for 
locks, leading to an increase in latency. 

In both Figure!^ and 1 10[ the latency of DGCC is compa¬ 
rable with the others. Although DGCC is a protocol that 
has a batch processing front end, the waiting time of one 
transaction in the transaction queue is much less than the 
other protocols. So the latency of DGCC is actually smaller. 
When we take transaction logs into consideration, DGCC 
commits a group of transactions at the same time and the 
log size is much smaller than traditional ARIES log. As a 
result, it invokes less syscalls to flush the logs to disks, and 
consequently, can make better use of the I/O bandwidth. 
Overall, this resulted in lower latency compared to the oth¬ 
ers, thus confirming the efficiency of DGCC. 


phase. With increasing contention, the concurrently exe¬ 
cutable work at each iterations tends to decrease. However, 
compared to the other protocols, the overhead incurred by 
DGCC is lower, making DGCC more robust to data access 
skewness. 

5.4 Latency 

In this section, we shall evaluate the latency using the 
OLTP workloads. The system maintains a transaction queue 
to buffer the transactions that have arrived. The size of the 
queue affects the average latency of the system. It also re¬ 
stricts the number of transactions in dependency graphs. In 
the experiments, for each worker thread, we set the default 
size of the transaction queue to 1000. 


5.5 Effects of Batch Size 

DGCC first constructs a dependency graph for a batch 
of transactions. The batch size is constrained by the num¬ 
ber of transactions in transaction queue and our pre-defined 
maximal batch size S. In practice, the batch size changes 
dynamically. In particular, when there are more transac¬ 
tions waiting in the transaction queue, a larger batch size is 
used. _ 

Figure 12(a)| shows the effects of the batch size on TPC- 
C workload. When the number of worker threads is fixed, 
the throughput of the system increases with the batch size. 
The increase stops when the computation resource is fully 
stretched. From such a point onwards, a larger dependency 
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graph leads to higher latency. 

When there are more worker threads, it always needs a 
larger size to fully exploit their computation potential. 



Batch Size 

(a) Throughput 



Batch Size 

(b) Latency 


Figure 12: Effects of Dependency Graph Size on 
Throughput and Latency 


6. RELATED WORK 

Systems with lock based protocols typically require a lock 
manager, in which lock tables are maintained to grant and 
release locks. The data structure in lock manger is usually 
very large and complicated, which incur both storing and 
processing overheads. 

Lightweight Intent Lock(LIL) was proposed to main¬ 
tain a set of lightweight counters in a global lock table in¬ 
stead of lock queues for intent locks. It simplifies the data 
structure of intent locks. However, the transactions that 
cannot obtain all the locks have to be blocked until receiv¬ 
ing a release message from other transactions. 


In order to reduce the cost of a global lock manager, 11 
|18| propose to keep lock states with each data record. How¬ 
ever, this idea requires each record to maintain a lock queue, 
and hence increases the burden of record management. By 
compressing all the lock states at one record into a pair 
of integers, simplifies the data structure to some ex¬ 
tent. However, it achieves this by dividing the database 


into disjoint partitions, which sacrifice its performance and 
scalability for workload of high contention. 

Several in-memory database prototypes that emphasize 
on scalability in multi-core systems have been proposed re¬ 
cently. implemented an in-memory database prototype 
and evaluated the scalability of seven concurrency control 
methods. While the reasons differ, the overall result is that 
none of the methods can scale beyond 1024 cores. For lock 
based methods, lock thrashing and deadlock avoidance are 
the main bottlenecks. For time-stamp based methods, the 
main issues are the high abort ratio and the need for a cen- 
tr alized time- stamp allocation. 

assume that data in an in-memory database 
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is partitioned, so as to remove the need for concurrency 
control. proposes H-STORE, a partitioned database 

architecture for in-memory database. Only one thread in 
each partition is responsible for processing transactions, and 
there is no need for concurrency control within a partition. 
DORA [231 is similar to H-STORE, in that it uses a data 


partitioning strategy and sends queries to different parti¬ 
tion’s worker for processing. However, unlike H-STORE, it 
is able to support concurrency execution of queries in a par¬ 
tition to a certain extent. Both systems cannot scale well 
for skewed workload and multi-partition transactions. 

Hekaton [^, the main memory component of SQL server, 
employs lock-free data structures and OCC-based MVCC 
protocol to avoid applying writes until the commit time. 
However, the centralized timestamp allocation remains the 
bottleneck, and the read operations become more expensive, 
since each read needs to update other transaction’s depen¬ 
dency set. 

|28| presented Silo, an in-memory OLTP database pro¬ 
totype optimized for multi-core systems. Silo supports a 
variant of OCC method which employs batch timestamp 
allocation to alleviate the performance loss. However, work¬ 
loads with high contention still affect its performance and 
scalability. 

Transactional memory |13[ has been shown to provide 
scalability with less programming complexity. Hence, it at¬ 


tracts much attention. 19 29 exploit hardware transaction 


memory, by chopping up transactions into small operations 
in order to fit them in hardware transaction memory. They 
also adopted timestamp based protocols to ensure the seri- 
alizability. 


7. CONCLUSION 

In this paper, we proposed DGCC, a new dependency 
graph based concurrency control protocol. DGCC separates 
concurrency control from execution by building dependency 
graphs for batches of transactions in order to resolve con¬ 
tention before execution. We showed that DGCC can better 
exploit modern multicore hardware by having higher paral¬ 
lelism. DGCC also removes the need of centralized con¬ 
trol components thereby giving better scalability. A proto¬ 
type DGCC-based OLTP system has been built that also 
seamlessly integrated an efficient recovery mechanism. Our 
extensive experimental study on YCSB and TPC-C shows 
that DGCC achieves a throughput that is four times higher 
than the classical concurrency control protocols for work¬ 
loads with high contention. 
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